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Abstract 

Classical multiscale analysis based on wavelets has a number of successful 
applications, e.g. in data compression, fast algorithms, and noise removal. 
Wavelets, however, are adapted to point singularities, and many phenom- 
ena in several variables exhibit intermediate-dimensional singularities, such 
as edges, filaments, and sheets. This suggests that in higher dimensions, 
wavelets ought to be replaced in certain applications by multiscale analysis 
adapted to intermediate-dimensional singularities. 

My lecture described various initial attempts in this direction. In partic- 
ular, I discussed two approaches to geometric multiscale analysis originally 
arising in the work of Harmonic Analysts Hart Smith and Peter Jones (and 
others) : (a) a directional wavelet transform based on parabolic dilations; and 
(b) analysis via anistropic strips. Perhaps surprisingly, these tools have po- 
tential applications in data compression, inverse problems, noise removal, and 
signal detection; applied mathematicians, statisticians, and engineers are ea- 
gerly pursuing these leads. 

Note: Owing to space constraints, the article is a severely compressed 
version of the talk. An extended version of this article, with figures used in 
the prese ntation, is available online at: 



http://www-stat. Stanford, edu/^ donoho /Lectures/ICM200', 
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1. Prologue 

Since the last ICM, we have lost three great mathematical scientists of the 
twentieth century: Alberto Pedro Calderon (1922-1999), John Wilder Tukey (1915- 
2000) and Claude Elwood Shannon (1916-2001). Although these three are not 
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typically spoken of as a group, I find it fitting to mention these three together 
because each of these figures symbolizes for me one aspect of the unreasonable 
effectiveness of harmonic analysis. 

Indeed we are all aware of the birth of harmonic analysis in the nineteenth 

century as a tool for understanding of the equations of mathematical physics, but it 
is striking how the original tools of harmonic analysis have frequently (a) changed, 
and (b) been applied in ways the inventors could not have anticipated. Thus, (a) 
harmonic analysis no longer means 'Fourier Analysis' exclusively, because wavelet 
and other forms of decompositions have been invented by modern harmonic analysts 
(such as Calderon); and (b) harmonic analysis finds extensive application outside 
of mathematical physics, as a central infrastructural element of the modern infor- 
mation society, because of the ubiquitous applications of the fast Fourier transform 
(after Tukey) and Fourier transform coding (after Shannon). 

There is a paradox here, because harmonic analysts are for the most part 
not seeking applications, or at any rate, what they regard as possible applications 
seem not to be the large-scale applications that actually result. Hence the impact 
achieved by harmonic analysis has often not been the intended one After meditat- 
ing for a while on what seems to be the 'unreasonable' effectiveness of harmonic 
analysis, I have identified what seems to mc a chain of argumentation that renders 
the 'unreasonable' at least 'plausible'. The chain has two propositions: 

• Information has its own architecture. Each data source, whether imagery, 
sound, text, has an inner architecture which we should attempt to discover 
and exploit for applications such as noise removal, signal recovery, data com- 
pression, and fast computation. 

• Harmonic Analysis is about inventing and exploring architectures for infor- 
mation. Harmonic analysts have always created new architectures for decom- 
position, rearrangement and reconstruction of operators and functions. 

In short, the inventory of architectures created by harmonic analysis amounts to an 
intellectual patrimony which modern scientists and engineers can fruitfully draw 
upon for inspiration as they pursue applications. Although there is no necessary 
connection between the architectures that harmonic analysts are studying and the 
architectures that information requires, it is important that we have many examples 
of useful architectures available, and harmonic analysis provides many of these. 
Occasionally, the architectures already inventoried by harmonic analysts will be 
exactly the right ones needed for specific applications. 

I stress that the 'externally professed goals' of harmonic analysis in recent 
decades have always been theorems, e.g. about the almost everywhere convergence 
of Fourier Series, the boundedness of Bochner-Riesz summation operators, or the 
boundedness of the Cauchy integral on chord-arc curves. These externally professed 
goals have, as far as I know, very little to do with applications where harmonic 
analysis has had wide scale impact. Nevertheless, some harmonic analysts are aware 
of the architectural element in what they do, and value it highly. As R.R. Coifman 
has pointed out to me in private communication: 

"The objective of Zygmund, Calderon and their school was not the 
establishment of new theorems by any means possible. It was often to 
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take known results that seemed like magic — e.g. because of the way 
they used complex variables methods — and tear them apart, finding 
the underlying structures and their inner interactions that made it abso- 
lutely clear what was going on. The test of understanding was measured 
by the ability to prove an estimate." 

In short, the goal was to find the right architecture, not merely to find the 
right estimate. 

2. Overview 

In my lecture, I was able to discuss the possibility that a coherent subject 
of Geometric Multiscale Analysis (GMA) can be developed - a subject spanning 
both mathematics and a wide range of applications. It is at this point unclear what 
the boundaries of the subject will be, but perhaps the speculative nature of what 
I had to say will excite the interest of some readers. I found it useful to orga- 
nize the presentation around the Calderon reproducing formula, which gave us the 
continuous wavelet transform, but also can be adapted to give us other multiscale 
transforms with interesting geometric aspects. The several different information 
architectures I described give an intuitive understanding of what GMA might con- 
sist of. In the article below, I will review some of the achievements of classical 
1-dimensional multiscale analysis (wavelet analysis) starting in the 1980's, both the 
mathematical achievements and the extensive applications; then I will as a warm-up 
discuss reasons that we need alternatives to 1-dimensional multiscale analysis and 
its straightforward d-dimensional extensions, and some ideas such as ridgelets, that 
point in the expected directions. In my lecture, I was able to discuss two harmonic 
analysis results of the 1990's - Hart Smith's "Hardy space for FIO's" and Peter 
Jones' "Travelling Salesman" theorem. Both results concern the higher-dimensional 
setting, where it becomes possible to bring in geometric ideas. I suggested that, 
in higher dimensions, there are interesting, nontrivial, nonclassical, geometric mul- 
tiscale architectures, with applications paralleling the one-dimensional case. I was 
able to sketch some developing applications of these post-classical architectures. If 
these applications can be developed as extensively as has been done for classical 
multiscale analysis, the impacts may be large indeed. In this article, I really have 
space only to mention topics growing out of my discussion of Hart Smith's paper. 
For an extended version of the article, covering the talk more fully, see |^ . 

Note: Below we make a distinction between stylized applications (idealized 
applications in mathematical models) and actual applications (specific contributions 
to scientific discourse and technological progress); we always describe the two in 
separate subsections. 

3. Classical multiscale analysis 

An efficient way to introduce classical multiscale analysis is to start from 
Calderon's reproducing formula, or as commonly called today, the Continuous 
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Wavelet Transform. We suppose we have a real- valued function / : R i-^ R which 
we want to decompose into contributions from various scales and locations. We take 
with a wavelet, an oscillatory real- valued function ip{t) satisfying the Calderon ad- 
missibility condition imposed on the Fourier transform -0 as I^('?^)Pt ~ 2"'' 
7^ 0. We translate and dilate according to {ipa,b){t) = i^Ht — h)la)/ ^Ja. We per- 
form Wavelet Analysis by 'hitting' the function against all the different wavelets, 
obtaining Wf{a,b) = {^Ja,b,f)', Wf is called the Continuous Wavelet Transform 
(CWT). The CWT contains all the information necessary to reconstruct /, so we 
can perform Wavelet Synthesis by integrating overall all scales and locations, 
summing up wavelets with appropriate coefficients. 

fit) - J Wf{a,b)i;a,bit)Kdadb). 

Here fj,{dadb) is the appropriate reference measure, in this case The 'tightness' 

of the wavelet transform as a characterisation of the properties of / is expressed by 
the Parseval-type relation ^ Wf{a,bf ^i{dadb) = J fltfdt. See also 0, 1^, H. 

3.1. Mathematical results 

The CWT maps / into a time-scale plane; by measuring properties of this time- 
scale portrait we can obtain norms on functions which lead to interesting theories 
of functional spaces and their properties; there are two broad scales of such spaces 
we can describe. To define the Besov B^^ spaces we integrate over locations first, 
and then over scales 

(/(/(i-<-)i-)'?)"'?)"' 

To define the Triebel-Lizorkin ^ spaces we integrate over scales first and then 
over locations 

Here s = i7— 1/2), and we adopt a convention here and below of ignoring the 
low frequencies so that actually these formulas arc only correct for functions which 
are built from frequencies |^| > Aq; the correct general formulas would require an 
extra term for the low frequencies which will confuse the novice and be tedious 
for experts. Also for certain combinations of parameters p,q = l,oo for example, 
changes ought to be made, based on maximal functions, BMO norms, etc., but in 
this expository work we gloss over such issues. 

Each of these norms asks that the wavelet transform decay as we go to finer 
scales, and so controls the oscillations of the functions. Intuition about some of 
these spaces comes by thinking of a wavelet coefficient as something akin to a 
difference operator such as f{b + a) ~ "^fib) + f{b — a); the various norms on the 
continuous wavelet coefficients measure explicitly the finite differences and implicitly 



Emerging Applications of Geometric Multiscale Analysis 



213 



the derivatives of the analyzed functions. The distinctions between spaces come in 
the subtle aspects of choice of order of integrating in scale and in location and in 
choice of p and q. We get the following sequence of relations between the spaces 
defined by the F and B scales and classical spaces: 

• LP : LP F°2, 1< p < oo. 

• HP : HP ^ F^^, 0<p< 1. 

• Sobolev: wf - F™2, 1 < p < cx). 

• Holder: ^ B'z 



CXD,00 



There are also equivalences with non-classical, but very interesting, spaces, such as 
the Bump Algebra B\ and almost-equivalences to some other fundamental spaces. 



such as BV{R). The full story about such equivalence is told very well in |42, p6| . 

An important structural fact about these spaces is that they admit molecular 
decompositions; we can define molecules as functions obeying certain size, smooth- 
ness and vanishing moment conditions, which are localized near an interval of some 
scale and location, and then show that, although elements of these spaces are de- 
fined by norms on the continuum domain, functions belong to these spaces if and 
only if they can be written as superpositions f{x) = Aquiq^x) where uiq are 
molecules and the Aq are scalar coefficients, and where the coefficient sequence 
{Aq)q obeys certain norm constraints. Results of this kind first emerged in the 
1970's; a canonical way to get such results uses the CWT Consider the dyadic 
cells 

Q = {(a, 6) : 2"^ > a > 2-^^+^\ < b < (fc+ l)/2^}, 

note that they obey ~ 1; they are "unit cells' for the reference measure. It 

turns out that the behavior of W{a, b) at various points within such a cell Q stays 
roughly comparable and that the ^pa,b all behave similarly as well. As a 

result, the integral decomposition offered by the Calderon reproducing formula can 
sensibly be discretized into terms arising from different cells. 



fix) = J W{a,b)tpa^bix)fi{dadb) 

= y2 W{a,b)%lJa.b{x)^i{dadb) 

Q •'Q 

= VAfQ(a;), Mq = / W{a,b)%l:a,blJi{d(^db) 

Q -Iq 

= ^AQmQ(x), AQ = \\W{-,-)\\mQ) 
Q 

Now roughly speaking, each mg is a mixture of wavelets at about the same location 
and scale, and so is something like a wavelet, a coherent oscillatory waveform of a 
certain location and scale. This type of discretization of the Calderon reproducing 
formula has been practiced since the 1970's, for example by Calderon, and by Coif- 
man and Weiss [p^ who introduced the terms molecular decomposition (and 
atomic decomposition) for discrete series of terms localized near a certain scale and 
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location. Hence, the mq may be called molecules and the Aq represent the contri- 
butions of various molecules. The spaces ^ and ^ can then be characterized 
by the decomposition f{x) — Aqitiq^x): we can define sequence-space norms 
/p q as in ( |3.2| ) below for which 

II/IIf^'„ - II(^q)qII/,'„, 

and similarly for Besov sequence norms ^ . This gives a clear understanding of 
the structure of / in terms of the distribution of the number and size of oscillations 
across scales. 

While the molecular decomposition is very insightful and useful for proving 
structure theorems about functional spaces, it has two drawbacks which severely 
restrict practical applications. First, the Aq are nonlinear functionals of the un- 
derlying object /; secondly, the uiq are variable objects which depend on /. As 
a result, practical applications of the sum Aqtuq are not as straightforward 
as one might like. Starting in the early 1980's, it was found that a much simpler 
and more practical decomposition was possible; in fact with appropriate choice of 
generating wavelet - different than usually made in the CWT - one could have an 
orthonormal wavelet basis |^ , |l6| . 

/ = ^M/(2--'",A:/2^>2-.,fe/2. = (3.1) 

Essentially, instead of integrating over dyadic cells Q, it is necessary only to sample 
once per cell! Several crucial advantages in applications flow from the fact that the 
coefficients aj,k are linear in / and the i/'j.fc are fixed and known. 

A theoretical advantage flows from the fact that the same norm equivalence 
that was available for the amplitudes (Aq) in the molecular decomposition also 
applies for the wavelet coefficients: 




This implies that the wavelets ipj.k make an unconditional basis for appropriate 
spaces in the Besov and Triebel scales. This can be seen from the fact that the norm 
involves only \aj^k \ a fact which is quite different from the case with Fourier analysis. 
Unconditionality implies that the balls {/ : < A} are closely inscribed by 

and circumscribed by balls {/ : ||a||fcj^ < A'} which are quite simple geometric 
objects, solid and orthosymmetric with respect to the wavelets as 'principal axes'. 
This solid orthosymmetry is of central significance for the optimality of wavelets for 
many of the stylized applications mentioned below; compare [^o[ ^ 

Our last chapter in the mathematical development of classical multiscale meth- 
ods concerns the connection between Besov spaces and approximation spaces. In the 
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late 1960's, Jaak Peetre observed that the space B"^^ -^y^, a > 1 was very special. 
It served as the Approximation Space for approximation in norm by free knot 
splines, i.e. as the set of functions approximable at rate nT" by splines with n free 
knots. In the 1980's a more general picture emerged, through work of e.g. Brudnyi, 
DeVore, Popov and Peller [Q: that the space served as the approximation 
space of many nonlinear approximation schemes (e.g. rational functions), under 
LP approximation error, where 1/t = cr + 1/p. This says that although t < 1 at 
first seems unnatural (because graduate mathematical training emphasizes convex 
spaces) these nonconvex spaces are fundamental. The key structural fact is that 
those spaces are equivalent, up to renorming, to the set of functions whose wavelet 
coefhcients belong to an i'^ ball, t < 1. Hence, membership of wavelet coefficients 
in an P' ball for small r becomes of substantial interest. The intuitive appeal for 
considering P' balls is clear by considering the closely-related weak-V balls; they 
can be defined as the constants C in relations of the form 

/i{(a, b) : \W{a, 5)| > e} < Ce^i/^, e > 0, 

or 

#{{3,k) : |a,, fel > e} < Ce-l/^ e > 0. 

They are visibly measures of the sparsity in the time-scale plane, and hence sparsity 
of that plane controls the asymptotic behavior of numerous nonlinear approximation 
schemes. 

3.2. Stylized applications 

We now mention some stylized applications of classical multiscale thinking, i.e. 
applications in a model world where we can prove theorems in the model setting. 

3.2.1. Nonlinear approximation 

Since the work of D.J. Newman in the 1960's it was understood that approx- 
imation by rational functions could be dramatically better than approximation by 
polynomials; for example the absolute value function \t\ on the interval [—1, 1] can 
be approximated at an exponential rate in n by rational functions with numerator 
and denominator of degree n, while it can be approximated only at an algebraic rate 
by polynomials of degree n. While this suggests the power of rational approxi- 
mation, it must also be noted that rational approximation is a highly nonlinear and 
computationally complex process. 

On the other hand, from the facts (a) that wavelets provide an unconditional 
basis for Besov spaces, and (b) that certain Besov spaces are approximation spaces 
for rational approximation, we see that wavelets give an effective algorithm for the 
same problems where rational functions would be useful. Indeed, based on work by 
DeVore, Popov, Jawerth, Lucier we know that if we consider the class of functions 
approximable at rate « n~'^ by rational approximation, these same functions can 
be approximated at the same rate simply by taking a partial reconstruction based 
on the n "biggest" wavelet coefficients. 
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In short, from the viewpoint of asymptotic rates of convergence, thresholding 
of wavelet coefficients - a very weakly nonlinear approximation scheme - is fully as 
effective as best rational approximation. The same assertion can be made comparing 
nonlinear approximation by wavelets and by free knot splines. 

3.2.2. Data compression 

Consider the following mathematical idealization of data compression. We 
have a function which is an unspecified element of a Besov Ball T = {f : < 
A\ and we wish to have a coder/decoder pair which can approximate any such 
function to within an e-distance in norm by encoding into, and decoding from, 
a finite bitstring. 

In mathematical terms, we are studying the Kolmogorov e-entropy: we wish 
to achieve N{e,!F)^ the minimal number of bits required to represent every / in 
T to within an error e. This is known, since Kolmogorov and Tikhomirov, to 
behave as 

A^(e,J^) X e"i/'",e^ 0. (3.4) 

Now, up to renorming, the ball T is isometric to a ball in sequence space Q = 
{a : llajlbj^ < A}. Such a ball is a subset of wV for 1/t — a + 1/2 and each 
element in it can be approximated in l"^ error at a rate M~^/'^+^/^ by sparse vectors 
containing only M nonzero coefficients. Here is a simple coder inspired by this 
fact. Pick M{e) coefficients such that the ^^-error of such an approximation is at 
most (say) e/2. The M(e) coefficients achieving this can be quantized into integer 
multiples of a base quantum g, according to aj^k = [c^j.fc/'zj j with the quantum 
chosen so that the quantized vector a^"?-* defined by o'^]. = q ■ aj,k, approximates 
the original coefficients to within £^ error e/2. The resulting integers aj^k represent 
the function / to within error e and their indices can be coded into bit strings, 
for a total encoding length of not worse that 0(log(e~-^)M(e)) = 0(log(e^^)e^^/'^). 
Hence a very simple algorithm on the wavelet coefficients gets close to the optimal 
asymptotics (|3.4|)! Underlying this fact is the geometry of the body Q; because of 
its solid orthosymmetry, it contains many high-dimensional hypercubes of ample 
radius. Such hypercubes are essentially incompressible. 

In fact the log(e~"'^) factor is removable in a wide range of a,p,q. In many 
cases, the e-entropy can be attained, within a constant factor, by appropriate level- 
dependent scalar quantization of the wavelet coefficients followed by run-length en- 
coding. In other work, Cohen et al. have shown that by using the tree-organization 
of wavelet coefficients one can develop algorithms which give the right order of 
asymptotic behavior for the across many smoothness classes; e.g. p^ . 

In fact more is true. Suppose we use for Besov ball simply the ball {/ : 
|Ja(/)||f)<' < ^} based on wavelet coefiicients; then by transform coding as in 
|25|] we can get efficient codes with codelength precisely asymptotic equivalence 
to the Kolmogorov e-entropy by levelwise ^^-sphere vector quantization of wavelet 
coefficients. Underlying this fact, the representation of the underlying functional 
class as an orthosymmetric body in infinite-dimensional space is very important. 
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3.2.3. Statistical estimation 

Consider the following mathematical idealization of nonparametric curve esti- 
mation. We have an unknown function f(t) on [0, 1] which is an element of a Besov 
Ball T = {f : < A} We observe data Y from the white noise model 

Y{dt) = f(t)dt + eW{dt), 

where the W{t) is a Wiener process and e the noise level, and we wish to reconstruct 
/ accurately. We measure risk using the mean squared error 



and evaluate quality by the minimax risk 



min maxi?(:(/, /). 

Over a wide range of cr,p, q, this minimax risk tends to zero at the rate (e2)2(T/(2cr+i) ^ 
In this setting, some simple algorithms based on noisy wavelet coefficients 
yj,k = J ^j,k{d)Y{dt) can be quite effective. In effect, j/j ^ = aj^k +iZj^k, where Zj^k 
is a white Gaussian noise. By simply applying thresholding to the noisy wavelet 
coefficients of Y, 



at scales < j < log2(e with threshold -^2 log(e~i) , we obtain a new set of coef- 
ficients; using these we obtained a nonlinear approximation / ~ J2j k '^j,k4'j,k- The 
quantitative properties are surprisingly good; indeed, using again the w£'^ embed- 
ding of the Besov body 6^ ^ , we have that the £^-error of nonlinear approximation 
to a using AI terms converges at rate A/^^/^+^/^. Heuristically, the coefficients 
surviving thresholding have errors of size e, and the object can be approximated 
by at most M of these with i"^ error « M~^/'^+^/^; simple calculations suggest that 
the risk of the estimator is then roughly • M + Af where AI is the number 

of coefficients larger than e in amplitude; this is the same order as the minimax 
risk 6^°'/^^'^+^)! (Rigorous analysis shows that for this simple algorithm, log terms 
intervene [^.) If we are willing to refine the thresholding in a level-dependent way, 
we can obtain a risk which converges to zero at the same rate as the minimax risk 
as e — > 0, e.g. |Q. Moreover, if we are willing to adopt as our Besov norm the 
sequence space bp ,^ norm based on wavelet coefficients, then by applying a sequence 
of particular scalar nonlinearities to the noisy wavelet coefficients (which behave 
qualitatively like thresholds) we can get precise asymptotic equivalence to the min- 



imax risk, i.e. precise asymptotic minimaxity |32 . Parallel results can be obtained 
with wavelet methods in various inverse problems, where / is still the estimand, 
but we observe noisy data on K f rather than /, with K a linear operator, such as 
convolution or Radon transform 12111. 
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3.2.4. Fast computation 

An important tlieme for scientific computation is the sparse representation, 
not of functions, but of operators. For this purpose a central fact pointed out by 
Yves Meyer is that wavelets sparsify large classes of operators. Let T be a 
Calderon-Zygmund operator (CZO); the matrix representation of such operator in 
the wavelet basis 

then M is sparse - all its rows and columns have finite P norms for each p > 0. In 
short, such an operator involves interactions between very few pairs of terms. 

For implications of such sparsity, consider the work of Beylkin, Coifman, and 
Rokhlin ||^. Suppose T is a CZO, and let Comp{e,n) denote the number of flops 
required to compute an e-approximation to PnTPn, where P„ is an projector onto 
scales larger than 1/n. In |^ it was shown that, ignoring set-up costs, 

Co7np[e,n) = 0(log(l/e)n) 

so that such operators could be applied many times with cost essentially linear in 
problem size, as opposed to the O(n^) cost nominally demanded by matrix multipli- 
cation. The algorithm was roughly this: represent the operator in a wavelet basis, 
threshold the coefficients, and keep the large coefficients in that representation. A 
banded matrix results, which can be applied in order 0{n) flops. (The story is a 
bit more subtle, since the algorithm as written would suffer an additional 0(log(n)) 
factor; to remove this, Beylkin, Coifman, and Rokhlin's nonstandard form must be 
applied.) 

3.3. Applications 

The possibility of applying wavelets to real problems relies heavily on the 
breakthrough made by Daubechies (building on work of Mallat |^) which 
showed that it was possible to define a wavelet transform on finite digital signals 
which had orthogonality and could be computed in order n flops. Once this algo- 
rithm was available, a whole range of associated fast computations followed. Cor- 
responding to each of the 'stylized applications' just listed, many 'real applications' 
have been developed over the last decade; the most prominent are perhaps the use 
of wavelets as part of the JPEG-2000 data compression standard, and in a variety of 
signal compression and noise-removal problems. For reasons of space, we omit de- 
tails, referring the reader instead to and to various wavelet-related conferences 
and books. 

4. Need for geometric multiscale analysis 

The many successes of classical multiscale analysis do not exhaust the oppor- 
tunities for successful multiscale analysis. The key point is the slogan wc formulated 
earlier - Information has its own architecture. In the Information Era, where new 
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data sources are proliferating endlessly, each with its own peculiarities and specific 
phenomena, there is a need for expansions uniquely adapted to each type of data. 

In this connection, note that classical wavelet analysis is uniquely adapted to 
objects which are smooth apart from point singularities. If a function is C°° except 
for step discontinuities at a finite set of points, its continuous wavelet transform 
will be very sparse. In consequence, the decreasing rearrangement of its wavelet 
coefficients will decay rapidly, and n-term approximations to the object will converge 
rapidly in norm. With the right definitions the story in high dimensions is 
similar: wavelets give a sparse representation of point singularities. 

On the other hand, for singularities along lines, planes, curves, or surfaces, the 
story is quite different. For functions in dimension 2 which are discontinuous along 
a curve, but otherwise smooth, the 2-dimensional CWT will not be sparse. In fact, 
the the decreasing rearrangement of its wavelet coefficients will decay like C/N, 
and A''-term approximations to the object will converge no faster than 0{N~^) in 
squared norm. Similar statements can be made for singularities of dimension 
< k < d in dimension d. In short, wavelets are excellent for representing smooth 
data containing point singularities but not singularities of intermediate dimensions. 

There are many examples of data where singularities of intermediate dimen- 
sions constitute important features. One example comes from extragalactic astron- 
omy, where gravitational clustering has caused matter to congregate in 'filaments' 
and 'sheets' in 3-dimensions. Another example comes from image analysis, say of 
SAR imagery, where stream beds, ridge lines, roads and other curvilinear phenom- 
ena punctuate the underlying background texture. Finally, recently-developed tools 
for 3D imaging offer volumetric data of phsyical objects (eg biological organs) where 
sheetlike structures are important. 

We can summarize our vision for the future of multiscale analysis as follows. 

If it is possible to sparsely analyze objects which are smooth apart from 
intermediate-dimensional singularities, this may open new vistas in mathemat- 
ical analysis, offering (a) new functional Spaces, and (b) new representation of 
mathematically important operators. 

If, further, it is possible algorithmize such analysis tools, this would open new 
applications involving (a ) data compression; (b ) noise removal and recovery from 
Ill-posed inverse problems; (c) feature extraction and pattern recognition; and (d) 
fast solution of differential and integral equations. 

But can we realistically expect to sparsely analyse such singularities? By 
considering Calderon-like formulas, we can develop some understanding. 

4.1. Ridgelet analysis 

We consider first the case of singularities of co-dimension 1. It turns out that 
the ridgelet transform is adapted to such singularities. 

Starting from an admissible wavelet ip, define the ridgelet Pa.b,e{x) — ijja^biugx), 
where ug is a unit vector pointing in direction 6 and so this is a wavelet in one direc- 
tion and constant in orthogonal directions |6| . In analogy to the continuous wavelet 
transform, define the continuous ridgelet transform Rf[a,b,9) = {pa.b,e,f)- There 
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is a synthesis formula 

f{x) = J Rf{a,b,9)pa,b,e{x)iJ,{dadbd9) 
and a Parseval relation 

Il/ll2 = J Rf{a,b, efp{dadbde) 

both valid for an appropriate reference measure /i. Note the similarity to the 
Calderon formula. 

In effect this is an analysis of / into contributions from 'fat planes'; it has 
been extensively developed in Emmanuel Candes' Stanford thesis (1998) and later 
publications. Suppose we use it to analyze a function f{x) G L^(R") which 
is smooth apart from a singularity across a hyperplane. If our function is, say, 
fu,a{x) — l{«'i;>a}e^"'^" , Candes showed that the ridgelet transform of fu,a is 
sparse. For example, a sampling of the continuous ridgelet transform at dyadic lo- 
cations and scales and directions gives a set of coefficients such that the rearranged 
ridgelet coefficients decay rapidly. It even turns out that we can define "orthonormal 
ridgelets" (which are not true ridge functions) such that the orthonormal ridgelet 
coefficients are sparse: they belong to every with p > |Q. In short, an 
appropriate multiscale analysis (but not wavelet analysis) successfully compresses 
singularities of co-dimension one. 



4.2. fc-plane ridgelet transforms 

We can develop comparable reproducing formulas of co-dimension k in R'^. 
If Pk denotes orthoprojector onto a fc-plane in i?'', and -0 an admissible wavelet 
for fc-dimensional space, we can define a fc-plane Ridgelet: Pa,b,Pk{x) = iJJa,biPkx) 
and obtain a A:-plane ridgelet analysis: Rf{a, b, Pk) ~ {pa.b,PkJ /)■ We also obtain a 
reproducing formula 

f{x) = j Rf{a,b, Pk)pa.b,Pk{x)p{dadb dPk) 

and a Parseval relation II/II2 = J Rf{a,b,Pk)^p{dadbdPk), with in both cases p{) 
the appropriate reference measure. In short we are analyzing the object / into 
'Fat Lines', 'Fat fc-planes,' 1 < fc < n — 1. Compare Unfortunately, all such 
representations have drawbacks, since to use them one must fix in advance the 
co-dimension fc; moreover, very few singularities are globally flat! 

4.3. Wavelet transforms for the full afRne group 

A more ambitious approach is to consider wavelets indexed by the general afhne 
group GA{n); defining {ijjA.bg){x) ~ ^{Ax + b) ■ \A\^/'^. This leads to the wavelet 
analysis Wf{A,b) = {ipA,b,f)- Taking into account the wide range of 'anisotropic 
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dilations and directional preferences poossible within such a scheme, we are analyz- 
ing / by waveforms which represent a very wide range of behaviors: 'Fat Points', 
'Fat Line Segments', 'Fat Patches', and so on. 

This exciting concept unfortunately fails. No matter what wavelet we pick 
to begin with, J Wf{A,b)^ ^{dAdb) = +00. (technically speaking, we cannot get a 
square-integrable representation of the general affine group; the group is too large) 
0, 1^. Moreover, synthesis fails: / Wf{A,b)il;a,b{^)f^(.dAdb) is not well-defined. 
Finally, the transform is not sparse on singularities. 

In short, the dream of using Calderon-type formulas to easily get a decomposi- 
tion of piecewise smooth objects into 'Fat Points', 'Fat Line Segments', 'Fat Surface 
Patches', and so on fails. Success will require hard work. 

4.4. A cultural lesson 

The failure of soft analysis is not unexpected, and not catastrophic. As Jerzy 
Neyman once said: life is complicated, but not uninteresting. As Lennart Carleson 
said: 

There was a period, in the 1940's and 1950's, when classical analysis 
was considered dead and the hope for the future of analysis was consid- 
ered to be in the abstract branches, specializing in generalization. As is 
now apparent, the death of classical analysis was greatly exaggerated ... 
the reasons for this ... [include] ... the realization that in many prob- 
lems complications cannot be avoided, and that intricate combinatorial 
arguments rather than polished theories are in the center. 

Our response to the failure of Calderon's formula for the full Ax + b group was 
to consider, in the ICM Lecture, two specific strategies for decomposing multidi- 
mensional objects. In the coming section, we will consider analysis using a special 
subset of the Ax + b group, where a Calderon-like formula still applies, and we can 
construct a fairly complete analog of the wavelet transform - only one which is effi- 
cient for singularities of co-dimension 1. In the lecture (but not in this article), we 
also considered analysis using a fairly full subset of the Ax + b group, but in a sim- 
plified way, and extracted the results we need by special strategies (viz. Carleson's 
"intricate combinatorial arguments") rather than smooth general machinery. The 
results delivered in both approaches seem to indicate the correctness of the vision 
articulated above. 

5. Geometric multiscale analysis 'with Calderon' 

In harmonic analysis since the 1970's there have been a number of important 
applications of decompositions based on parabolic dilations 

fa{xi,X2) = fi{a^^^Xi,aX2), 

so called because they leave invariant the parabola X2 — x\. Calderon himself used 
such dilations Q and exhibited a reproducing formula where the scale variable acted 
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through such dilations. Note that in the above equation the dilation is always twice 
as strong in one fixed direction as in the orthogonal one. 

At the same time, decompositions began to be used based on directional 
parabolic dilations of the form 

fa,e{xi,X2) = fa{Re{xi,X2y). 

Such dilations (essentially) leave invariant curves defined by quadratic forms with 
9 as one of the principal directions. For example, Charles Fefferman in effect used 
decompositions based on parabolic scaling in his study of Bochner-Riesz summa- 
bility citeFefferman. Elias Stein used decompositions exhibiting parabolic scaling 
in studying oscillatory integrals in the 1970's and 1980's In the 1990's, Jean 
Bourgain, Hart Smith, Chris Sogge, and Elias Stein found applications in the study 
of oscillatory integrals and Fourier Integral operators. 

The principle of parabolic scaling leads to a meaningful decomposition reminis- 
cent of the continuous wavelet transform, only with a much more strongly directional 
character. This point has been developed in a recent article of Hart Smith |3^, who 
defined a continuous wavelet transform based on parabolic scaling, a notion of di- 
rectional molecule, showed that FIO's map directional molecules into directional 
molecules, and showed that FIO's have a sparse representation in a discrete decom- 
position. For this expository work, we have developed what seems a conceptually 
simple, perhaps novel way of approaching this topic, which we hope will be accesible 
to non-experts. Details underlying the exposition are available from [ p6[ . 

5.1. Continuous directional multiscale analysis 

We will work exclusively in R^, although everything generalizes to higher 
dimensions. Consider a family of directional wavelets with three parameters: scale 
a > 0, location b e R'^ and orientation 6 e [0, 27r). The orientation and location 
parameters are defined by the obvious rigid motion 

ipa,b.e = -ipafifii^eix - b)) 

with Rg the 2-by-2 rotation matrix effecting planar rotation by 9 radians. At 
fine scales, the scale parameter a acts in a slightly nonstandard fashion based on 
parabolic dilation, in the polar Fourier domain. We pick a wavelet V'l.o with ijj of 
compact support away from 0, and a bump (/)i_o supported in [—1,1]. Here ipifi 
should obey the usual admissibility condition and ||(/)||2 = 1. At sufficiently fine 
scales (say a < 1/2) we define the directional wavelet by going to polar coordinates 
(r, uj) and setting 

In effect, the scaling is parabolic in the polar variables r and lu, with lu being 
the 'thin' variable; thus in particular the wavelet ipa,o,o is not obtainable by affine 
change-of-vartiables on ipa'fi.o for a' ^ a. We omit description of the transform at 
coarse scales, and so again ignore low frequency adjustment terms. Note that it is 
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correct to call these wavelets directional, since they become increasingly needle-like 

at fine scales. 

Equipped with such a family of high-frequency wavelets, we can define a Di- 
rectional Wavelet Transform 

DW{a, b, 0) = {iPaAO^ /), a > 0, 5 G R2, e e [0, 2tt) 

It is easy to see that we have a Calderon-like reproducing formula, valid for high- 
frequency functions: 

fix) = j DW{a,b,0)iJa,bAx)Kdadbd0) 
and a Parseval formula for high-frequency functions: 

||/||2, = j DW{a,b,0fn{dadbd0) 

in both cases, n denotes the reference measure -tt^-tt?— • 

Based on this transform, we can define seminorms reminiscent of Besov and 
Triebel seminorms in wavelet analysis; while it is probably a major task to prove 
that thse give well-founded spaces, and such work has not yet been done (for the 
most part), it still seems useful to use these as a tool measuring the distribution 
of a function's 'content' across scale, location and direction. We get a directional 
Besov-analog DB^^: integrating over locations and orientations first 

(/(/„W,,M,l«-.^^J.)"'$)"' 

and a Trieb el-analog DF^ ^ by integrating over scales first 

/ {J(\DWiaAe)\a-r^^y\0db 

In both cases we take s = a — 3/2(l/p — 1/2). (There is the possibility of defining 
spaces using a third index (eg -Bp,g,r) corresponding to the norm in the variable, 
but we ignore this here). As usual, the above formulas can only provide norms for 
high-frequency functions, and would have to be modified at coarse scales if any low 
frequencies were present in /. As in the case of the continuous wavelet transform 
for R, there is some heuristic value in considering the transform as measuring finite 
directional differences e.g. f {b+ae0) — 2f{b)+f {b—aeg), where e0 = (cos(0), sin(^))'; 
however this view is ultimately misleading. It is better to think of the transform 
as comparing the difference between polynomial approximation localized to two 
different rectangles, one of size a by ^/a and the other, concentric and co-oriented, 
of size 2a by V2a. 

The transform is actually performing a kind of microlocal analysis of / far more 
subtle than what is possible by simple difference/ differential expressions. Indeed, 
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consider the Heaviside H{x) — l^x-^yoy, then at fine scales DW{a,0,9) = for 
l^l > Va and DW{a, 0,0) w a^/^ for \e\ < y/^, so that J^"" \DW{a,0,e)\de < 
Ccc'l^ as a — > 0. In short, DW is giving very precisely the orientation of the 
singularity. Moreover, for h ^ (0,0:2)', \DW [a,0,d)\dB — > rapidly as a ^ 0. 
So the transform is localizing the singularity quite well at fine scales, in a way 
that is difficult to imagine simple differences being able to do. Interpreting the 
above observations, we learn that a smoothly windowed Heaviside ](x) ~ H{x)e~^ 
belongs in DB^ ^ but not in any better space DB^ ^, a > 0, while it belongs 
in DBl and not in any better space DB" a > 1. The difference between the 
critical indices in these cases is indicative of the sensitivity of the p = \ seminorms 
to sparsity. Continuing in this vein, we have that for weak embeddings, for each 
?7 > 

/i{(a,6,6i) : \DW{a,h,e)\ > e} < Ce-O/s--)) 

so that the space-scale-direction plane for the (windowed) Heaviside is almost in 
L'^^^{ijl) ^ ^-^2^3 2/3; the Heaviside has something like 3/2-derivatives. In compar- 
ison, the wavelet expansion of the Heaviside is only in , so the expansion is denser 
and 'more irregular' from the wavelet viewpoint than from the directional wavelet 

viewpoint. For comparison, the Dirac mass S belongs at best to B^^ and B^ ^ 

3/2 1/2 

while it belongs at best to DBoc.oo and DB^ . The 'point singularity' is more reg- 
ular from the wavelet viewpoint than from the directional wavelet viewpoint, while 
the Heaviside is more regular from the directional wavelet viewpoint than from the 
wavelet viewpoint. In effect, the Dirac 'misbehaves in every direction', while the 
Heaviside misbehaves only in one direction, and this makes a big difference for the 
directional wavelet transform. 

There are two obvious special equivalences: first, ~ DF22 ^ DB22 and 
L?' Sobolev ~ DF22 ^ DB™2- There are in general no other equiva- 

lences. Outside the Sobolev scale, the only equivalence with a previously pro- 
posed space is with Hart Smith's "Hardy Space for Fourier Integral Operators" 
113 ■ ^F/o ^ ^^12- This space has a molecular decomposition into directional 
molecules, which are functions that, at high frequency, are roughly localized in 
space to an a by rectangle and roughly localized in frequency to the dual rect- 
angle rotated 90 degrees, using traditional ways of measuring localization, such as 
boundedness of moments of all orders in the two principal directions. Under this 
qualitative definition of molecule. Smith showed that Ti-pjo has a molecular decom- 
position / = Aqtuq^x) in which the coefficients obey an £^ norm summability 
condition ^ |Aq|2-''^/'' < 1 when the directional molecules are normalized. This 
is obviously the harbinger for a whole theory of directional molecular decomposi- 
tions. 

More generally, one can make a molecular decomposition of the directional 
Besov and directional Triebel classes by discretizing the directional wavelet trans- 
form according to tiles Q = Q{j, ki, k2, £) which obey the following desiderata: 

• In tile Q{j, ki, k2,£), scale a runs through a dyadic interval > a > 2~^^~^^\ 

• At scale , locations run through rectangularly shaped regions with aspect 
ratio roughly 2^^ by 2^^/^. 
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• The location regions are rotated consistent with the orientation 

• The tile contains orientations running through 27r^/2J/2 < gi ^ 27r(^+ l)/2J'/2_ 

Note again that for such tiles n{Q) ~ 1. Over such tiles different values of DW{a, bO) 
are roughly comparable and different wavelets V'a,6,6i as well. Hence it is sensible to 
decompose 

f{x) = J DW{a,b,e)^a,bAxMdadbd9) 

= V / DW{a,b,0)tpabe{x)n{dadbd0) 
Q Jq 

= ^MQ(a;), Mq{x) 
Q 

= ^AQTUgix), Aq 

Q 

Morever, for any decomposition into directional molecules (not just the ap- 
proach above), the appropriate sequence norm of the amplitude coefficients gives 
control of the corresponding directional Besov or directional Triebel norm. It is 
then relatively immediate that one can define sequence space norms for which we 
have the norm equivalences 

ll/IbB,^,,^ II(^q)qIU6j„, ll/lli^F,^., ^I1(^q)qIU/-„ (5.5) 

where we again omit discussion of low frequency terms. The sequence space equiv- 
alence db^l 2 ^ 2 ^ are trivial. An interesting equivalence of relevance to the 

Heaviside example above is ^^2/3 2/3 ^ f^^'^-. so that, again, a smoothness space 
with "p < 1" is equivalent to an P' ball with r < 1. 

Hart Smith made the crucial observation that the molecules for the Smith space 
are invariant under diffeomorphisms. That is, if we take a C°° diffeomorphism </>, 
and a family of 'hOpjQ molecules (such as mQ{x)), then every rhQ{x) = mQ{4>{x)) 
is again a molecule, and the sizes of moments defining the molecule property are 
comparable for mq and for mg . It follows that HpjQ is invariant under diffeomor- 
phisms of the base space. His basic lemma underlying this proof was strong enough 
to apply to invariance of directional molecules in every one of the directional Besov 
and directional Triebel classes. Hence directional Besov and directional Triebel 
classes are invariant under diffeomorphisms of the base space. 

This invariance enables a very simple calculation, suggesting that the direc- 
tional wavelet transform sparsifies objects with singularities along smooth curves, or 
at least sparsifies such objects to a greater extent that does the ordinary wavelet 
transform. Suppose we analyse a function / which is smooth away from a disconti- 
nuity along a straight line; then the Heaviside calculation we did earlier shows that 
most directional wavelet coefficients are almost in weak L^/^. Now since objects 
with linear singularities have ^2/^+^ boundedness of amplitudes in a molecular de- 
composition, and directional molecules are diffeomorphism invariant, this sparsity 



= / DW{a,b,9)iJa,b,e{x)l^{dadbd0) 
JQ 

= \\DW{a,b,e)h.^Q) 
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condition is invariant under difFeomorphisms of the underlying space. It follows 
that an object which is smooth away from a discontinuity along a smooth curve 
should also have molecular amplitudes in P'l'^^'^ . 

This sparsity argument suggests that directional wavelets outperform wavelets 
for representing such geometric objects. Indeed, for 77 > there is an e > so that 
£2/3+e boundedness of directional wavelet molecular amplitudes shows that approxi- 
mation by sums of N directional molecules allows a squared-L^ approximation error 
of order 0(iV~^+''), whereas wavelet coefficients of such objects are only in so 
sums of N wavelets only allow squared-L^ approximation error of size 0{N~^\ 

5.2. Stylized applications 

The above calculations about sparsification of objects with curvilinear singu- 
larities suggests the possibility of using the directional wavelet transform based on 
parabolic scaling to pursue counterparts of all the various classical wavelet appli- 
cations mentioned in Section 3: nonlinear approximation, data compression, noise 
removal, and fast computations. It further suggests that such directional wavelet 
methods might outperform calssical wavelets - at least for objects containing sin- 
gularities along smooth curves, i.e. edges. 

5.2.1. First discretization: curvelets 

To develop applications, molecular decomposition is (once again) not enough: 
some sort of rigid decomposition needs to be developed; an orthobasis, for example. 

Candes and Donoho developed a tight frame of elements exhibiting parabolic 
dilations which they called curvelets, and used it to systematically develop some of 
these applications. A side benefit of their work is knowledge that the transform 
is essentially optimal, i.e. that there is no fundamentally better scheme of nonlin- 
ear approximation. The curvelet system has a countable collection of generating 
elements j^{xi,X2) , /i ~ io,j,bk-^.k2T^iTtm) which code for scale, location, and di- 
rection. They obey the usual rules for a tight frame, namely, the reconstruction 
formula and the Parseval relation: 

The transform is based on a series of space/frequency localizations, as follows. 

• Bandpass filtering. The object is separated out into different dyadic scale 
subbands, using traditional bandpass filtering with passband centered around 

1^1 e [2^2^+1]. 

• Spatial localization. Each bandpass object is then smoothly partitioned spa- 
tially into boxes of side 2^^/^. 

• Angular localization. Each box is analysed by ridgelet transform. 

The frame elements are essentially localized into boxes of side 2^^ by 2^^/^ at 
a range of scales, locations, and orientations, so that it is completely consistent with 
the molecular decomposition of the directional Besov or directional Fourier classes. 
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However, unlike the molecular decomposition, the coefficients are linear in / and 
the frame elements are fixed elements. Moreover, an algorithm for application to 
real data on a grid is relatively immediate. 

5.2.2. Nonlinear approximation 

In dimension 2, the analog to what was called free knot spline approximation is 
approximation by piecewise polynomials on triangulations with N pieces. This idea 
has generated a lot of interest but frustratingly few hard results. For one thing, it is 
not obvious how to build such triangulations in a way that will fulfill their apparent 
promise, and in which the resulting algorithm is practical and possible to analyze. 

Here is a class of two-dimensional functions where this scheme might be very 
attractive. Consider a class T of model 'images' which exhibit discontinuities across 
smooth curves. These 'images' are supposed to be away from discontinuity. 
Moreover, we assume uniform control both of the norm for the discontinuity 
curve and smooth function. One can imagine that very fine needle-like triangles 
near curved discontinuities would be valuable; and this is indeed so, as p7| shows; 
in an ideal triangulation one geta a squared error converging at rate N'^wheveets 
adaptive quadtrees and other simpler partitioning schemes give only conver- 
gence. Moreover, this rate is optimal, as shown in if we allow piecewise smooth 
approximation on essentially arbitrary triangulations with N pieces, even those de- 
signed by some as yet unknown very clever and very nonlinear algorithm, we cannot 
in general converge to such objects faster than rate N^'^. 

Surprisingly, a very concrete algorithm does almost this well: simply thresh- 
olding the curvelet coefficients. Candes and Donoho have shown the following |^ 

Theorem: The decreasing rearrangement of the frame coefficients in the 
curvelet system obeys the following inequality for all f G J-: 

|a|(fc) <CA:-3/2iog3/2(fc)^ k>i. 

This has exactly the implication one would have hoped for from the molecu- 
lar decomposition of directional Besov classes: the frame coefficients are in ^2/3+£ 
for each e > 0. Hence, we can build an approximation to a smooth object with 
curvilinear discontinuity from N curvelets with squared L -error log^(iV) • as 
mentioned earlier, Wavelets would give squared L^-error > cN^^. 

In words: approximation by sums of the iV-biggest curvelet terms does essen- 
tially as well in approximating objects in as free-triangulation into N regions. 
In a sense, the result is analogous to the result mentioned above in Section 3.2.1 
comparing wavelet thresholding to nonlinear spline approximation, where we saw 
that approximation by the iV-biggest amplitude wavelet terms does as well as free- 
knot splines with N knots. There has been a certain amount of talk about the 
problem of characterizing approximation spaces for approximation by N arbitrary 
triangles; while this problem seems very intractable, it is clear that the directional 
Besov classes provide what is, at the moment, the next best thing. 



5.2.3. Data compression 
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Applying just the arguments already given in the wavelet case show that the 
result of nonlinear approximation by curvelets, combined with simple quanti- 
zation, gives near-optimal compression of functions in the class T above, i.e. the 
number of bits in the compressed representation is optimal to within some polylog 
factor. This seems to promise some interesting practical coders someday. 



5.2.4. Noise removal 

The results on nonlinear approximation by thresholding of the curvelet coef- 
ficients have corresponding implications in statistical estimation. Suppose that we 
have noisy data according to the white noise model 

Y{dxi, dx2) = f{xi,X2)dxidx2 + eW{dxi, dx2) 

where W is a Wiener sheet. Here / comes from the same Tmage Model' T discussed 
earlier, of smooth objects with discontinuities across smooth curves. We mea- 
sure risk by Mean Squared Error, and consider the estimator that thresholds the 
curvelet coefficients at an appropriate (roughly 2-y/log(e^i)) multiple of the noise 
level. Emmanuel Candes and I showed the following |9[|: 

Theorem: Appropriate thresholding of curvelet coefficients gives nearly the 
optimal rate of convergence; with polylog{e) a polynomial in log(l/e), the estimator 
/"^"^ obeys 

Reif, F^) < polylogie) ■ minmaxi?,(/, /). 

Hence, in this situation, curvelet thresholding outperforms wavelet threshold- 
ing at the level of rates: 0[polylog{e)-e^/^) vs 0{e). Similar results can be developed 
for other estimation problems, such as the problem of Radon inversion. There the 
rate comparison is polylogie) ■ e^l^ vs log(l/e) • e^/'^; In empirical work p^ , 
we have seen visually persuasive results. 



5.2.5. Improved discretization: directional framelets 

The curvelet representation described earlier is a somewhat awkward way of 
obtaining parabolic scaling, and also only indirectly related to the continuum di- 
rectional wavelet transform. Candes and Guo suggested a different tight frame 
expansion based on parabolic scaling. Although this was not introduced in such a 
fashion, for this exposition, we propose an alternate way to understand their frame, 
simply as discretizing the directional wavelet transform in a way reminiscent of 



(3.1); for details, see Assuming a very specific choice of directional wavelet, 

one can get (the fine scale) frame coefficients simply by sampling the directional 
wavelet transform, obtaining a decomposition 

/ = ^i?l^(2-^5^fc;%^,27rV2J'/2)i^2-.,fc/2^2^£/2./= = ^aj.fe,ii^j,fc,;,say ; 



as usual, this is valid as written only for high-frequency functions). In fact this 

2 



can yield a tight frame, in particular the Parseval relation ^Ylijki'^jki ~ II /Ilia 
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This has conceptual advantages: a better relationship to the continuous directional 
wavelet transform and perhaps an easier path to digital representation. In compar- 
ison with the original curvelets scheme, curvelets most naturally organizes matters 
so that 'within' each location we see all directional behavior represented, whereas di- 
rectional framelets most naturally organize matters so that 'within' each orientation 
we see all locations represented. 

5.2.6. Operator representation 

Hart Smith, at the Berlin ICM, mentioned that decompositions based on 
parabolic scaling were valuable for understanding Fourier Integral Operators (FIO's) 
]3^ ; in the notation of our paper, his claim was essentially that FIO's of order 
zero operate on fine-scale directional molecules approximately by performing well- 
behaved affine motions - roughly, displacement, scaling and change of orientation. 
Underlying his argument was the study of families of elements generated from a 
single wavelet by true affine parabolic scaling 4>a,b,0{x) = 4>{Pa o Re o Sbx) where 
Pa = diagia, ^/a) is the parabolic scaling operator and SbX = a; — 5 is the shift. 
Smith showed that if T is an FIO of order and </) is directionally localized, the 
kernel 

■^aiV = (0a,b,e,T0a',b',e') 

is rapidly decaying in its entries as one moves away from 'the diagonal' in an ap- 
propriate sense. 

Making this principle more adapted to discrete frame representations seems 
an important priority. Candes and Demanet have recently announced that 
actually, the matrix representation of FIOs of order in the directional framelet 
decomposition is sparse. That is, each row and column of the matrix will be in £p 
for each p > 0. in a directional wavelet frame. This observation is analogous in 
some ways to Meyer's observation that the orthogonal wavelet transform gives a 
sparse representation for Calderon-Zygmund operators. Candes has hopes that this 
sparsity may form some day the basis for fast algorithms for hyperbolic PDE's and 
other FIO's. 

5.3. Applications 

The formalization of the directional wavelet transform and curvelet transform 
are simply too recent to have had any substantial applications of the 'in daily use 
by thousands' category. Serious deployment into applications in data compression 
or statistical estimation is still off in the future. 

However, the article |^ points to the possibility of immediate effects on re- 
search activity in computational neuroscience, simply by generating new research 
hypothesis. In effect, if vision scientists can be induced to consider these new types 
of image representation, this will stimulate meaningful new experiments, and re- 
analyses of existing experiments. 

To begin with, for decades, vision scientists have been influenced by mathe- 
matical ideas in framing research hypotheses about the functioning of the visual 
cortex, particular the functioning of the VI region. In the 1970's, several authors 
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suggested that the early visual system does Fourier Analysis; by the 1980's the 
cutting edge hypothesis became the suggestion that the early visual system does 
Gabor Analysis; and by the 1990's, one saw claims that the early visual system 
does a form of wavelet analysis. While the hypotheses have changed over time, the 
invariant is that vision scientists have relied on mathematics to provide language & 
intellectual framework for their investigations. But it seems likely that the hypothe- 
ses of these previous decades are incomplete, and that to these should be added the 
hypothesis that the early visual system performs a directional wavelet transform 
based on parabolic scaling. During my Plenary Lecture, biological evidence was 
presented consistent with this hypothesis, and a proposal was made that future 
experiments in intrinsic optical imaging of the visual cortex ought to attempt to 
test this hypothesis. See also p9[ . 



6. Geometric multiscale analysis 'without Calderon' 

In the last section we considered a kind of geometric multiscale analysis em- 
ploying a Calderon-like formula. In the ICM Lecture we also considered dispensing 
with the need for Calderon formulas, using a cruder set of multiscale tools, but one 
which allows for a wide range of interesting applications - very different from the 
applications based on analysis/synthesis and Parseval. Our model for how to get 
started in this direction was Peter Jones' travelling salesman problem. Jones con- 
sidered instead a countable number of points X — {xi} in [0, 1]^ and asked: when 
can the points of X be connected by a finite length (rectifiable) curve ? And, if 
they can be, what is the shortest possible length? Jones showed that one should 
consider, for each dyadic square Q such that the dilate 3Q intersects X, the width 
WQ of the thinnest strip in the plane containing all the points in Xn3Q, and define 
Pq = wq / diam{Q) the proportional width of that strip, relative to the sidelength 
of Q. As Pq — when the data lie on a straight line, this is precisely a measure 
of how close to linear the data are over the square Q. He proved the there is a 
finite-length curve F visiting all the points in X = {xi] iff 'Y^ 0Qdia'm{Q) < oo. 
I find it very impressive that analysis of the number of points in strips of various 
widths can reveal the existence of a rectifiable curve connecting those points. In 
our lecture, we discussed this idea of counting points in anistropic strips and several 
applications in signal detection and pattern recognition ^ ^, with applications in 
characterizing galaxy clustering |34[ . We also referred to interesting work such as 
Gilad Lerman's thesis under the direction of Coifman and Jones, and to |]30| , 
which surveys a wide range of related work. Look to p6| for an extended version 
of this article covering such topics. 



7. Conclusion 

Important developments in 'pure' harmonic analysis, like the use of parabolic 
scaling for study of convolution operators and FIOs, or the use of anisotropic strips 
for analysis of rectifiable measures, did not arise because of applications to our 
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developing 'information society', yet they seem to have important stylized appli- 
cations which point clearly in that direction. A number of enthusiastic applied 
mathematicians, statisticians, and scientists are attempting to develop true 'real 
world' applications. 

At the same time, the fniitful directions for new kinds of geometric multiscale 
analysis and the possible limitations to be surmounted remain to be determined. 
Stay tuned! 
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